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Abstract 

It is well known that Newman's modularity function Qn has the form Qn = Qd —Qo, 
where Qd is the intracluster edge density and Qq is a term corresponding to the null 
model. Hence modularity maximization is influenced by Qd, which favors a small number 
of clusters, and —Qo which favors balanced clusters. We show that the — Qq term can 
cause not only underestimation of the cluster number (the well known "resolution limit" of 
modularity) but, in certain cases, also overestimation. Furthermore, we construct families 
of graphs, each of which has a "natural" community structure which, however, does not 
maximize modularity. In fact, we show that we can always find a graph G with a "natural 
clustering" V and a sequence of clusterings U x (with approximately equal-sized clusters) 
such that the pair (G, U x ) has higher modularity than (G, V). More specifically, the pair 
(G, V) has low "natural modularity" while the pair (XJ X ,G), by appropriate choice of x, 
can achieve modularity arbitrarily close to one. In addition, JJ X can be arbitrarily different 
from the natural clustering V; more specifically, by appropriate choice of x, their Jaccard 
similarity can become arbitrarily close to zero. 



1 Introduction 

Newman's modularity function Qn is probably the most popular quality function in the com- 
munity detection literature. A large number of community detection algorithms are based on 
some form of modularity maximization. However, this approach can also run into problems, 
as has been widely reported in the literature. This paper studies one particular way in which 
modularity can fail, and can be summarized as follows. 

We will use the term "cluster" as a synonym of "community" ; a clustering is a partition of 
the nodes of a graph; additional nomenclature and notation are introduced in Section [2] 

Section [3] is devoted to an interpretation of Qn- Modularity can be written in the form 
Qn = Qd —Qo, where Qd is the intracluster edge density and Q is a term corresponding to 
the null model. It follows that maximizing modularity is equivalent to maximizing the sum of 



Qd and — Qq. As explained in Section 3.2, the term Qd favors clusterings with few clusters and 



many intracluster edges; as explained in Section 3.3, the term — Qq favors "balanced clusterings 



(i.e., clusterings with equally sized clusters) and is also responsible for cluster number selection. 
The fact that modularity maximization yields an estimate of the number of clusters is generally 
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perceived as an advantage over other quality functions. However, it is well known that this 
estimate can, in certain cases, be wrong. The literature has concentrated on the case where the 
estimate is lower than the true number of clusters ("resolution limit" of modularity) but we give 
examples of the opposite effect. These problems can be attributed to the previously mentioned 
fact that — Qo favors balanced clusterings. 

In Section [4] we exploit the behavior of — Qo and construct examples in which modularity 
maximization yields arbitrarily inaccurate clusterings. More specifically, we construct families of 
graphs {Gk,Ni,n 2 }k=i (where Nx, N 2 are parameters of the graph) with the following properties. 

1. Each graph Gk,n 1 ,n 2 has a "natural" clustering Vk,Ni,n 2 which, however, does not maxi- 
mize modularity. 

2. We can find graphs Gk,N!,n 2 an d clustering \Jk„N!,n 2 ,j such that: 

2.1. the pair [G k,n 1 ,n 2 i k,,n 1 ,n 2 ,j) has higher modularity than (G k,Ni,n 2 ^^ k,,n 1 ,n 2 ) j 

2.2. the modularity of {G K,N lt N 21 ^J k„n 1 ,n 2 ,j) is arbitrarily close to one; 

2.3. the similarity between clusterings ~Vk,n x ,n 2 an d ^Jk,n x ,n 2 ,j is arbitrarily close to zero. 

In Section [5] we discuss our results and (previously published) related work. 

2 Preliminaries 

1. We deal with finite graphs without multiple edges. A graph G is a pair (V,E), where V 
is the node set (we will always assume V = {1,2, ...,n}; hence the number of nodes is 
n = \V\) and E C V x V is the edge set (and m = \E\ is the number of edges). 

2. The adjacency matrix of G is an n x n matrix A with A m>n = 1 iff {m, n} G E and 
otherwise. There is a one-to-one correspondence between a graph G and its adjacency 
matrix A. 

3. A clustering of G = (V,E) is a partition V= {V\, Vk} of V (i.e., U k=1 Vk = V and 
Vk, I : Vk H Vi = 0). The size of the clustering is K, the number of clusters. Given a graph 
G = (V, E), we denote by V the set of all clusterings of V and by Vk the set of clusterings 
of size K. 

4. Each of the V&'s is called a cluster or, synonymously, a community. For k = 1, 2, K we 
set nk = \Vk\ and n' k = \V — Vk\. We have X]jtLi n fc = n - 

5. A clustering V= {Vi, V^} of the graph G = (V, E), induces a family of edge sets 
E= {E u , E 12 , E KK }, where 

e = {u, v} G iff tt € and v G Vj. 

Note that EV,- = -Ejj. The elements of En (for i G {l,...,n}) will be called intracluster 
edges; the elements of Eij (for i ^ j) will be called extracluster edges. We also write 
Ek = E kk . 
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6. The degree function is denoted by deg (•) and is denned as follows: for any v G V, deg (v) = 
\{e : v G e}| is the number of edges incident on v; for any U C V, deg (U) = Ylveu ( v )> 
the sum of degrees of the nodes contained in U . 

7. The Jaccard similarity index is defined as follows. Given any two clusterings Wi, W2 
define 

an = "num. of node pairs in same cluster under Wi and same cluster under W 2 "; 

aio = "num. of node pairs in same cluster under W x and different cluster under W 2 "; 

a i = "num. of node pairs in different cluster under Wi and same cluster under W 2 "; 

aoo = "num. of node pairs in different cluster under Wi and different cluster under W 2 ". 

Then the Jaccard similarity index S (W 1; W 2 |G) (with respect to the graph G) is 

an 



5(W 1 ,W 2 |G) 



OlO + a 01 + a ll 



S (Wi, W 2 |G) takes values in [0, 1]; values close to 1 show that W 1; W 2 are very similar; 
values close to that they are very different. Note that the similarity of Wj and W 2 is 
computed with respect to G. 



3 An Intepretation of Modularity 
3.1 Modularity 

Given a graph G = (V, E) with adjacency matrix A, we denote the modularity of a clustering 
V by Qn (V, G) and, following [25J, we define it by 

i,jev v 7 

Our notation emphasizes that Qn (V, G) is a function of both the graph and the clustering. 

The motivation for introducing modularity can be seen by the following interpretation^ 
Qat (V, G) is the difference of the fraction of the intracluster edges minus the expected value of 
the same quantity in a graph (the null model) with the same clusters but random connections 
between the nodes. A large value of Qn (V, G) indicates that, under V, G is quite different from 
the null model; this is taken as evidence of G having il strong community structure" which is 
"well captured" by V. Hence modularity is a clustering quality function. 

Other interpretations of modularity are possible; we will propose one a little later. But first 
let us note that, in addition to characterizing a single (V, G) pair, modularity can be used to com- 
pare clusterings: by definition, V is a better clustering of G than V iff Qn (V, G) > Qn (V, G). 
Taking this one step further, V* = argmaxv Qn (V, G) is the best clustering of G, hence modu- 
larity maximization can be used to obtain graph clusterings (i.e., perform community detection); 
and a large value of maxy Qn (V, G) indicates that G has strong community structure. 

1 Which is a slight paraphrase of Newman and Girvan 25j. 
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While modularity is widely used, its shortcomings have been widely reported in the literature. 
For example, the modularity resolution limit has attracted a lot of attention [T3l [T4"] ; we will 
discuss it in Section 3.3| But first let us note what appears to be a more basic limitation 



of modularity. As already mentioned, a large Qn (V, G) value indicates strong community 
structure and good clustering; but what is a "large Qn (V, G) value"? While it is known [7] 
that -\ < Q N (V,G) < 1, many examples appear in the community detection literature [131 
12| [T4] which have strong (intuitively perceived) community structure and yet their maximum 
modularity is closer to zero than to one. The converse can also be true [HI S] Q 

A frequently proposed explanation for the shortcomings of modularity is that the null model 
assumption is not justifiecQ [12]. In Section 3.3 we will consider an alternative explanation. But 



first we will examine another clustering quality function. 
3.2 Intracluster Edge Density 

A popular characterization of a network / graph community is that "there must be more edges 
'inside ' the community than edges linking vertices of the community with the rest of the graph" 
[T2"| Section B.l]. We will take this as a basic guiding principle^ 

A prima facie reasonable way to quantify the principle is through the intracluster edge 
density, denoted by Qd (V, G) and defined by 

Q rf (V,G) = ^ fc=l1 *' ■ (2) 
m 

For every G and V, Qd (V, G) G [0, 1]. A high (i.e., close to 1) value of Qd (V, G) indicates that 
the pair (V, G) has many intracluster and few extracluster edges. 

Unfortunately, a high Q d (V, G) value does not guarantee either that G has strong community 
structure or that V is a good clustering of G. Indeed we can always achieve Qd (V, G) — 1 
by taking V = {V} (i.e., the unique clustering of size one) but this tells us nothing about the 
"true" community structure of G. This observation can be generalized. First define the following 
function 

Fg (K) = max Qd (V, G) . (3) 
v e v K 

For a given graph G, Fg (K) is the maximum intracluster edge density achieved by clusterings 
of size K. Now we can prove the following. 

Theorem 3.1 For any graph G = (V, E), Fg (K) is a nonincreasing function of K . 

Proof. There exists a single clustering of size one, namely V^= {V}. Denote the intracluster 
edges by E^; obviously E\' = E (i.e., all edges are intracluster). Hence Fg (1) = , L = 1. 



2 In is stated that ""In Section VI. C we have seen that high values of the modularity of Newman and 
Girvan do not necessarily indicate that a graph has a definite cluster structure" . And in [1] : "since fluctuations 
can induce high modularity in random graphs, one must always approach the raw magnitude of Q with caution" . 

3 In [12] is stated that "The weak point of the null model is the implicit assumption that each vertex can 
interact with every other vertex, which implies that each part of the graph knows about everything else." 

4 An extreme statement of this principle appears in [S]: "a community network Go = (V; E ) [is] a graph Go 
that is a disjoint union of complete subgraphs" . 
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Vj , V 2 ? be the optimal clustering of size two; the intracluster edges are E\ 



Let V (2)- /> I" 1 ' 1 

and £| 2) . We have e{ 2) U £^ 2) C £ and 



(2) 



(2) 



+ 



E. 



(2) 



F G (2) = g d (v^c?) 





Ef> 


\E 





< \E\. Hence 



< 1 = F G (1). 



Let 
be E[ 



t/(3) T/ (3) T/ (3) 



be the optimal clustering of size three; the intracluster edges willl 



(3) p (3) 



-Eg 3 ''. Create the clustering (of size two) V'= 



with intracluster 



edges E[, E' 2 . Note that E{ 0) = E[ and Ef'U E y 3 °> C E' 2 . In other words, some edges which 
were extracluster in may become intracluster in V. Hence we have 



(3) 



1^1, 



E. 



(3) 



+ 



E. 



(3) 



< I Ml 



and 



i 71 G (3) = g d (v( 3 ),g) 



T 3 


4 3) 







< 



^2k 


= 1 











= Q d (V',G) <Q d (V^,G) <F G 



(2) 



Proceeding in this manner we get 1 = Fq (1) > -Fg (2) > ••• > i*b (n) > and the proof is 
complete. ■ 

Hence for any G the optimal K (with respect to Qd) is K — 1; it follows that Qd maximization 
cannot determine the optimal number of clusters. However, if K is given in advance, then 
VW = argmaxveVx Qd (V, G) is a reasonable candidate for the best clustering of size K. 
Actually this has often been expressed as a criticism of intracluster edge density, e.g., in [12] it 
is stated that 11 Algorithms for graph partitioning are not good for community detection, because 
it is necessary to provide as input the number of groups... 11 . However, this criticism is valid only 
to the extent that other algorithms exist which can obtain the the number of groups (clusters). 



3.3 Modularity as Augmented Intracluster Edge Density 

As seen in Section |3.1[ an alleged advantage of modularity is that its maximization yields the 
correct number of clusters; however, it is well known that there is a modularity resolution 
limit [131 EH]) i- e -' m certain cases the maximum modularity clustering yields the wrong cluster 
number. Why is this the case? 

Our explanation will make use of the modularity formula 

k=l k=l v 7 

which, as is well known, is equivalent to ([I]). Comparing Q and ^ we see that 

Q N (V, G) = Q d (V, G) - Q (V, G) (5) 
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where 

K 



k=l v 



(6) 



Hence modularity can be seen as an augmented intracluster edge density or, in other words, the 
difference of intracluster edge density Qd (V, G) and the auxiliary function Q (V, G). What is 
the role of Qo (V, G)l Its introduction is usually justified with reference to the null model [25j. 
Here is another way to look at the matter. 

Suppose momentarily that K is given and we want to minimize Q (V, G) = J2k=i y zm 

For simplicity of notation define pk = de{ =( Vk ' ; we have 



2m ' 
K K 

5> fc = E 



deg(V k ) 
2m 



k=l k=l 

Hence we want to solve the following problem: 

K K 



given K, minimize ^^pt subject to: < pt < 1 and ^^Pfc = 1- (7) 



k=l k=l 



Assuming for the moment that the p k s are continuously valued (this will be later relaxed), the 
solution to Q is p/~ — for all k; the minimum thus achieved is j^. If K is not given, the 
problem becomes 



K K 



minimize 



y^Pfc subject to: K € {1, n} , < p k < 1 and ^^Pfc = 1- (8) 
fe=i fc=i 



Solving ([8]) separately for each K e {1, we see that the overall optimal solution is K = n 

(each cluster contains a single node) and p k = - for all k\ the minimum thus achieved is -. 

Let us now return to modularity maximization. We know that (a) Qn = Qd — Qo, (b) Qd 
achieves its maximum at K = 1 and (c) Q achieves its minimum at K = n. At an intuitive level, 
it is easy to understand the factors influencing the outcome of modularity maximization: the Qd 
term pulls K towards small values and the — Qo towards large ones; in addition the Qd term favors 
clusterings which correspond to the "natural" community structure of G, while the — Qo favors 
"balanced" clusterings in which all clusters contain more or less the same number of nodes. These 
conclusions are based on the maximization problem (|8]), in which the pkS vary continuously. It 
is reasonable to expect that, for large n, they will also be (at least approximately) true when 
Pk takes discrete values of the form pk = 2M~- This can be justified as follows: with a large 
V, V is also large and it becomes likely that some clustering V = {Vi, Vk\ exists which can 
make the terms de ^^ approximately equal for all k (of course, this will also depend on the 
distribution of the degrees deg {y ), since deg (\4) = J2 v ev k ^ e S ( v ))- Hence intuitively we expect 
that the role of the Qo term is to (a) increase the number of clusters and (b) equalize the sizes 
of the clusters. 

Similar remarks have appeared in previous works. For example in [37J is stated that "the 
existing modularity optimization method does not perform well in the presence of unbalanced 
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community structures" and in [13] is stated that "For modularity's null model graphs, the 
modularity maximum corresponds to an equipartition of the graph". An interesting point is 
that the resolution limit has been identified as the tendency of modularity maximization to 
underestimate the "true" number of cluster^} But, as we will see in Section [|| modularity 
maximization may also produce the opposite effect, i.e., ov erestimation of cluster number. 

One method used to address the resolution limit is to introduce a modified modularity 
function. This function can often [THf |22| [26| 133] be written in the form 

Qn (V,G) = Q d (V,G)- 7 Qo (V,G) 

where 7 is a "tuning parameter". With 7 = 1, Q N (V, G) = Qn (V, G), the original Newman's 
modularity. If this underestimates (resp. overestimates) the "true" number of clusters, formation 
of more clusters can be encouraged by increasing (resp. decreasing) 7 and hence the influence 
of the — Qq (V, G) term on the maximization problem. 

4 Bad Clusterings with High Modularity 

In this section we prove the existence of graphs which have (i) a "natural" clustering and (ii) 
a family of "arbitrarily bad" clusterings such that the arbitrarily bad clusterings achieve higher 
modularity than the natural one. In addition we show that the arbitrarily bad clusterings 
can achieve modularity arbitrarily close to one and they can be "arbitrarily different" from 
the natural clustering. (In the sequel we will explain precisely what we mean by the terms 
"natural", "arbitrarily bad" and "arbitrarily different".) We will establish all of these results 
using two parametric graph families. Obviously, these results indicate that, at least in certain 
cases, modularity is not a good quality function. 

4.1 First Example 

Let us construct a family of graphs Gk,n 1} n 2 (K, Aq, N 2 are parameters) such that the following 
two properties are satisfied. 

PI For every K,Ni,N 2 , the graph Gk,N!,n 2 has an easily recognized "natural" clustering 

P2 We can select K, Aq, N 2 and a clustering V* K Nl N (different from "V^Ni^) suc h that 

Qn (U^- )A r l A r 2 , Gk,N 1 ,N^) > Qn (Vk,Ni,N2i Gk,n u n 2 ) ■ (9) 

Suppose K,Ni,N 2 are given. Let us first define the disconnected graph Gn x ,n 2 to be the 
union of a path of Aq nodes and a path of N 2 nodes; and then let the disconnected graph 
Gk,Ni,n 2 be the union of K disconnected copies of (jjvi,jv 2 - The construction is illustrated in 
Figure [1] 

5 For example in [13] is stated that "The networks that we have examined are fairly small but the problem we 
have discovered can only get worse if we increase the network size, especially when small communities coexist 
with large ones and the module size distribution is broad, which seems to happen in many cases"; in [12] is stated 
that "modules identified through modularity optimization may actually be combinations of smaller modules" . 
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Figure 1: Graph Family A 

What is the the natural clustering of Gk,Ni,n 2 7 We claim it is Vk,N!,n 2 = {VK,Ni,N 2 ,ii 
Vk,Ni,n 2 ,2-, Vk,Ni, n 2 ,2k} > where VK,Ni,N 2 ,k is the node set of the fc-th disconnected component 
of G (with k G {1, 2, 2K}, see Figure [I]). At the risk of belaboring the obvious, we note 
that, if u G Vk,N\,n 2 ,i and v G VK,N lt N 2 ,j and i ^ j, then there exists no path connecting u 
and v; hence they should never be put in the same cluster. So the biggest possible clusters 
are the Vft-,jVi,jV 2 ,i' s - O n the other hand, there is no justification for splitting any Vx;,Ni,N 2 ,i m t° 
smaller clusters, since all of its nodes have the same connectivity pattern. Hence ~Vk,N!,n 2 is the 
"intuitively best" (i.e., the "natural") clustering of Gk,N!,n 2 - 

For every triple (K, N\, N 2 ) let us now introduce a sequence {U k,Ni,n 2 ,j} C j = i of clusterings 
of Gk,n 1 ,n 2 - For a fixed J, let L = [jj; writing for brevity Uj in place of U K,N lt N 2 ,j, we let 
Uj = {Ux, Uj, Uj + i} consist of the following J + 1 clusters: 

Ut = {1, L} , U 2 = {L + 1, 2L} , ... , Uj — {(J — 1)L+1, JL} , U J+1 = {JL + 1, n} 

if n = JL then Uj + i = 0. In other words, Uj contains J clusters each containing the same num- 
ber, L = I jj , of nodes and perhaps an additional cluster with fewer than L nodes. Obviously 
Uj is a "well balanced" clustering. 

Our goal is to prove We will do this in three steps, corresponding to the following three 
propositions. 

Lemma 4.1 For every K , N\ , N 2 G N with N\, N 2 > 3 we have 

O (V G )-l {Nl ~ 1)2 + {N2 ~ 1)2 (10) 

l^N { V K,Ni ,N 2 i ^K,Ni ,N 2 ) — -I ~ 2~ • UUJ 

A (iVi + iV 2 — I) 

Proof. We fix K, Ni, N 2 and, for brevity, we write G for Gk,N!,n 2 and V for Vk,Ni,n 2 - We have 

Q N (V G) = ^ 2 = x ' Efc ' ^ 6g 



m (2m) 2 



S 



G has no extracluster edges under V, hence we have 



2^k=l \ h k 
m 



1. 



11' 



We can separate V into two subsets of clusters: V = {Vi, V3, V2K-1} contains the the clusters 
with Ni nodes and V" = {V 2 , V4, V 2 k] contains the the clusters with N 2 nodes. Each 14 G V 
has Ni — 2 "inner nodes" of degree 2 and two "border nodes" of degree 1; similarly, each Vk G V" 
has degree deg (V k ) = 2 (JV X - 2) + 2 = 2 (N 2 - 1). Hence 

V : V k G V : deg (V fe ) = 2 (iVi - 2) + 2 = 2 (JV X - 1) 

V : V k G V" : deg (T4) = 2 (iV 2 - 2) + 2 = 2 (iV 2 - 1) 



The total number of edges is 

£v fc ev de § ( V k) + Ev fc ev deg ( V k) 



m 



K (Ni +N2-2). 



Also, 



Eltt (deg (V k )f _ Ev k ev (2 (#1 - 1))' + Ev t£ v» (2 - 1)) 



(2m 2 ) 



(2# (iVi + iV 2 - 2)) 2 (2K (A^ + AT 2 - 2)) 2 
K (A^i - l) 2 + K ■ (N 2 - l) 2 (Nx - l) 2 + (N 2 - i f 



K 2 (Nx + N 2 - 2) 



X (iVi + iV 2 - 2)' 



(12) 



Combining (11) and (12) we get (10). 



Lemma 4.2 For every K, Nx, N 2 , J G N roi/i iVi, iV 2 > 3 we /jai>e 

1 



Qn (Uk,Ni,N 2 ,Ji GK,N lt N 2 ) > 1 



J 



2(Nx + N 2 ) 2 Tl 



K{Nx+N 2 -2)" ( Ni + n 2 - 2 ) 2 ' 
Proof. We write G for Gk,Ni,n 2 , V for Vk,Ni,n 2 an d a l so Uj for U k,Ni,n 2 ,j- We have 



(13) 



ESl^l (deg TO) 



j+i 



Qiv (U Js G0 



m 



(2m) z 



1 1 

Consider first fc=1 — -. A little thought shows that U; has at most J + 1 clusters and J 

m 

extracluster edges. Hence 

■ ES Igfcj > ™- J = x _ Z 



VJ 



1 



1 



??? 



rn 



rn 



K(Nx + N 2 -2 



J. 



(14) 



Consider now '^' k ~ 1 ^ c ^i Uk ^ . Each Uk has no more than j = K ( Nl + N2 ^ nodes and each node has 



(2m)' 



degree at most 2. Hence 

ES(deg(f/ fc )) 2 < ( J + 1 )-( 2 



VJ : 



(2m) 



K(N!+N 2 ) 
J 



^ 2(Nx+N 2 y j_ x 
2 2 K 2 [Nx + N 2 - 2) z " (Nx + N 2 - 2) 2 
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(15) 



(since VJ 6 N : ^ < 2). Combining (p[ and ((IB) we get ((l3|). 



Hence, to ensure Qn(^k. 



NiN 2 ,Ji Gk,N!N 2 



> Qn (y^K,N 1 N 2 yGK,N 1 N 2/ 



(i.e., that the natural 



clustering Vk,NiN 2 has lower modularity than \J K> 



,N 1 N 2 ,J, 



it suffices to select K, Ni, N 2 , J ap- 
propriately and use Lemmas 4.1 and 4.2 A sufficient condition, obtained from (10) and (13), 
is 



1 



1 -j - 2 ^+^Vi>i 



(Ni - iy + (n 2 - ly 



(16) 



K {N x + N 2 - 2) " (n 1 + N 2 -2) 2 K [N t + N 2 - 2) 2 

Inspecting ((161 we see that one way to satisfy it is by fixing N x and letting J be "sufficiently 



larger" than K and N 2 "sufficiently larger" than J. This is the main idea used in the proof of 
the following theorem. 

Theorem 4.3 For every K £ N and e & (0, j^) there exist N%, N 2 , J E N (depending on e and 
K) such that 



Qn (UiC,Afi,Af 2 ,J) Gk,N!,N 2 ) > ^~ £> ^~ r ^ — Qn (Vk,Ni,N 2 , Gk,Ni,N 2 ) , 



£ > S (U Ki 



V 



N!,N 2 ,J, V K,Ni,N 2 \^K,Ni,N 2 



G, 



(17) 
(18) 



Proof. Take any K and e > 4 and let iVj = 3, J = xi^, A^ 2 = x 2 K. To prove (17) note that 



Qn (U 



K,Ni,N 2 ,J, G K ,N 1 ,N 2 



Qn {Vk ,N U N 2 , Gk,Ni,N 2/ 



> 1 



x _ 2(3 + x 2 K) 2 
(l + x 2 K) ~ (l + x 2 K) 2 xK'' 
A+(x 2 K-lf 



K(l+x 2 Ky 



We have 



1 - 



xK 



2(3 + x 2 Kf 

1 + x 2 K) ~ (l + x 2 K) 2 xK 
A + (x 2 K-l) 2 



1 -rf + o(l) 



+ o (x) . 



K{l + x 2 K) 2 ' K 
Hence, for (17) to hold, x must be big enough for o(x) to be negligible and we also need 



xK 



>1-£>1 



1 

K 



which is satisfied for any x > 3/Ke. In short, we can satisfy (17) for every K e N and and 
e G (0, -^), by taking x sufficiently big and Ni = 3, J = xK, N 2 = x 2 K. 

Let us prove ( Jl8| . Let b (resp. c) be the number of node pairs in the same cluster under 
^JK,N lt N 2 ,j (resp. under ~VK,N lt N 2 )- We obviously have b = a i + an > an and aio + aoi + an > 
aio + an = c > 0. Hence 



S {Uk,Ni,n 2 ,j,Vk,n u n 2 \Gk,Ni,n 2/ 



an 



aio + a oi + an 



< 
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We will obtain an upper bound for b. Since each Uj contains no more than L = j nodes , 

the number of node pairs that can be formed in Uj is no more than un 2 J 1 < Also, 
n = K (Ni + N 2 ) so, for big N 2 , "^Jr- < ^jr~- There are at most J + 1 clusters, so we have 

b < (J + 1) « = + 1) = A" 3 ic 3 + *V. 



Now we will compute c. In Vk,n u n 2 there exist clusters of Ni = 4 nodes; each has iVl( ' i ^ 1 ^ 
node pairs; there also exist K clusters of N 2 nodes; each has iV2 ^ 2 ~ 1 ) node pairs. We have 



2 2 



And so we have 



K 3 x 3 + K 2 x 2 

< S (Uk,Ni,N2,J, 'Vk,Ni > N 2 \Gk ) n 1 ,N 2 ) < 



^3^3 + ^2^2 

< lim S 1 (Ujsr,iVi,iv 2 ,j, V^jv^jValG^jVi,^) < hm 1 1 = 0. 



Hence, for x large enough (18) is satisfied. ■ 

Remark. We see from (17) that we can always find a clustering Ur^^j which achieves 
modularity arbitrarily close to one and, in addition, better than the natural clustering Vk,Ni,n 2 - 
Note that here "close to one" means greater than 1 — e, where e can get arbitrarily small inde- 
pendently of K. On the other hand, Qn (Vk,n u n 2 , Gk,Ni,n 2 ) < 1 — h which can become rather 
small when K takes small values. In other words, Gjc,Ni,n-i has low "natural modularity" (the 
one achieved by the pair (Vk,Ni,n 2 , Gk^n 2 )) but it can be assigned high "artificial modularity" 
(the one achieved by the pair (Uk^n^JjGk,^,^) )• 

Remark. Furthermore, from (18) we see that Uk,Ni,n 2 ,j is ver Y different from Vk,N!,n 2 (accord- 
ing to the Jaccard similarity; similar results can also be proved for other clustering similarity 
indices, for example the information theoretic distance used by Danon |9J). In a sense, this is 
not surprising: recall that J = xK and we can choose x arbitrarily large; hence Uk,Ni,n 2) j wm 
have xK clusters, which is many more than the 2K clusters contained in Vk,Ni,n 2 (note that 
for the {Gk,n 1 ,n 2 } C k = i family modularity maximization overestimates the number of clusters). 
Example. Let us give some numerical examples of the above ideas. 

1. Taking K = 2, x = 4 and Nx = 3 we also get J = xK = 8 and N 2 = x 2 K = 32. With 
these values we get 

n rv r \ n ^kq i ffi Z ^ + (gg Z ^ 

[ v K,Ni,N 2 i ^K,Ni,N 2 ) ~ U.OODy — 1 ^ - ^ — — g , 



K (Ni + N 2 - 2Y 
1 

K JN X + A^ 2 - 2) " (JVx + N 2 - 2) s 



2 CAT 4- iV ) 2 

Qn (Uk, Ni ,n 2 ,Ji G k , NuN2 ) = 0.7657 > 0.5976 = 1 - r ^ /Ar t \ T ^ J - ^} \ t 2 ' r 



both (16) and (17) are satisfied. 
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2. Taking K = 4, x = 10 and N x = 3 we also get J = xK = 40 and N 2 = x 2 K = 400. With 
these values we get 



Qn (V 



K,Ni,N 2 i Gk,Ni,N 2 



0.7525 = 1 - 



{N, - l) 2 + {N 2 - l) 2 



K(N 1 + N 2 -2 



,2 ' 



Qat (U KiNl)N2;J , G k ,n u n 2 ) = 0.9504 > 0.9246 = 1 



2 (iVj + N 2 f 
K (Ni + N 2 ~2)" (jVi + iV 2 - 2) 



-J 



again (16) and (17) are satisfied 



4.2 Second Example 



It may be argued that the results of Section |4.1| are only possible because we have used discon- 
nected graphs. This is not the case. In this section we will illustrate the same deficiency using 
a family of connected graphs. We introduce the connected graphs , Hk,n 1 ,n 2 illustrated 

in Figure [2} Each HN lt N 2 graph is a path of N% + N 2 nodes, with extra edges added between the 
first Ni (resp. the second iV 2 ) nodes at distance two of each other. The H K Nl N2 is constructed 
by joining K H NltN2 subgraphs in series. 




Figure 2: Graph Family B 
We will use the same clusterings ~Vk,N!,n 2 and the sequences of clusterings {U k,n u n 2 j}jLi 



as in Section 4.1. Once again, we claim that Vk,Ni,n 2 is the natural clustering of H K ^ NltN2 , for 
reasons similar to the ones discussed previously. Namely, cluster boundaries should occur across 
edges incident on the most weakly connected nodes; this shows that the VK,N 1} N 2 ,k clusters must 
be preserved; any partition of Vk,n 1 ,n 2 ,u into finer clusters cannot be justified, since all of its 
nodes have the same connectivity pattern. Hence 'Vk,n 1 ,n 2 is the "intuitively best" (i.e., the 
"natural") clustering of H K)Nl N2 . 



Once again, we obtain a result similar to Theorem 4.3 in three steps. 
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Lemma 4.4 For every K, N x , N2 G N with N x , N 2 > 5 we have 

/\ 1 1 1 \ 1 — ,v~ 



AT ((4jVi - 8) 2 + (4iV 2 - 8) 2 ) 
(4A~ (A^ + iV 2 - 2)) 2 



(19) 



Proof. We fix A", iVj, iV 2 and, for brevity, we write i7 for H K ^ NltN2 and V for V^jvi,/^; V' and 
V" have the same meaning as previously. For each V k G V, there are two border nodes on the 
left, two border nodes on the right and N x — A inner nodes. Each of the inner nodes has degree 
4; each of the border nodes has degree 3, except for the first and last node of the graph, which 
have degree 2. Hence for each V k G V we have the bounds 

(iVx - 4) • 4 + 4 • 2 = 4A^! - 8 < deg (V k ) < AN X - 4 = (N x - 4) • 4 + 4 • 3. 

Similarly, for each V k G V" we have the bounds 

(iV 2 - 4) • 4 + 4 • 2 = 4iV 2 - 8 < deg (V k ) < AN 2 - 4 = (iV 2 - 4) • 4 + 4 • 3. 

Hence 



2K 



K ((AN, - 8) 2 + (4iV 2 - 8) 2 ) < £ (deg (F fc )) 2 < AT ((4^ - 4) 2 + (4iV 2 - 4) 2 ) . 



k=l 



The total number of edges is m = ^ggC^fc) anc [ h ence we have 



AT (4jVi - 8 + 4iV 2 - 8) Efcf 1 deg (Vfc) AT (4^ - 4 + 4iV 2 - 4) 
2 < 2 < 2 

2A~ (N x + N 2 - 4) < m < 2K (N x + N 2 - 2) . 

In Vk,Ni,n 2 there exist 2 A" — 1 extracluster edges, so we have 

sr 2K 1 tp 



< 1. 



7U 



Combining (20) and (21) we get 



£j£i (deg > K ({AN, - 8) 2 + (4iV 2 - 8) 2 ) 



(2m) 



(4AT (N X + N 2 -2)Y 



Finally, combining (21) and (23) we get the required bound. ■ 
Lemma 4.5 For every A", iVi, N 2 , J G N with N x , N 2 > 3 we have 



Qn (Uj, h k 



N U N 2 , 



> 1 



2 (iVi + iV 2 ) 2 
2 AT (iVx + iV 2 - 4) " (^ + iV 2 - 4) 



J 



rJ- 1 . 



(20) 



(21) 



(22) 



(23) 



(24) 
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Proof. Each Uk has at most 6 extracluster edges, but these are counted twice; hence Uj cannot 
have more than 3 J extra-cluster edges and so 



ESl^l > ^-3J _ 1 3J 3J 

m ~ m m 2K (N x + N 2 - 4) ' 



(25) 



Each Uk has at most y = K ( Nl + N2 } nodes and each node has degree at most 4. Hence 



ES (deg (17,))' < < J + (4™^) 2 2 W + 



(2m) 



(4A (iVi + iV 2 - 4))' (JVi + iV 2 — 4)' 



J" 1 . 



(26) 



Combining (25) and (26) we get the bound (|24j). ■ 

Hence, to ensure Qn (U k,n 1 n 2 ,j, Hk,n 1 n 2 ) > Qn {Vk,NiN 2 , Hk,NxN 2 ) ^ suffices to choose 
appropriate A , N\ , N 2 , J and use Lemmas 4.4 and 4.5 A sufficient condition, obtained from 
(pb and d24b, is 



1 - 



J 



2 (JVj + N 2 ) 



2A (JVi + A 2 - 4) " (iVi + iV 2 — 4) 2 
Now we can prove the following. 



J- X >1 



A ■ ((4iVi - 8) 2 + (4iV 2 - 8) 2 ) 
{AK{N x + N 2 -2)) 2 



(27) 



Theorem 4.6 For every A G N and e G (0, ^=J there exist Ni, N 2 , J G N (depending on e,K) 
such that 



Qn (Uk,Ni,n 2 ,j, Hk,Ni,n 2 ) > 1 — ^ > Qn (Vk,Ni,n 2 , Hk,n-i,n 2 ) 

e > S (U K,N lt N 2 ,J > V K,Ni,N 2 \Hk,Ni,N 2 ) ■ 

Proof. Take any A and e. Letting N\ = 6, J = xK, N 2 = x 2 K we have 

3xA 2(6 + x 2 A) 2 



(28) 
(29) 



Qn (U 



,Ni,N 2 



> 1 



Qat (V K,N!,N 2 , Hk,n u n 2 ) < 1 — 



2A(2 + x 2 A) xA(2 + :r 2 A) 2 ' 
A ■ (l6 2 + (4x 2 A - 8) 2 ' 
(4A (4 + x 2 A)) 2 



For large enough x we have 



] 4+(x 2 A-l) 2 _ ! 
A(l + x 2 A) 2 
2 (3 + x 2 A) 2 



A" 



+ o(x), 



(l + x 2 A) (l + x 2 K)xK 2Kx Kx 



2 7 

+ o (x) = 1 — ^7 — h o (x) . 



2Kx 



Hence, for (28) to hold, x must be big enough for o(x) to be negligible and we also need 

1 - 2 4 >1 - £>1 -f 
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which is satisfied for any x > jj^. It follows that (28) holds for every KeN and e G (0, ±), 
if x is sufficiently big and N\ = 3, J = xK, N 2 = x z K. Hence we have proved (28). We can also 
prove that (29) is satified for big enough x; the argument is similar to the one used for ( |l~8~| ) and 
hence is omitted. ■ 

Example. Here are two numerical examples. 

1. Taking K = 2, x = 5 and N± = 5 we also get J = xK = 10 and N 2 = x 2 K = 50. With 
these values we get 



Qn (Vk,n u n 2 , H KiNi>N2 ) = 0.5538 < 0.5883 = 1 
Qn (Uk !Ni ,n 2 ,j,H KiNuN2 ) = 0.7811 > 0.6203 = 1 



K ■ ({AN X - 8) 2 + (4iV 2 - 8) 2 ) 



{AK{N X + N 2 -2)Y 



3J 



2 (JVi + N 2 



2K (JVi + N 2 - 4) (iVi + iV 2 - 4)' 



•J- 1 ; 



both (27) and (28) are satisfied. 



2. Taking K = 4, x = 10 and iVj = 5 we also get J = xK = 40 and N 2 = x 2 K = 400. With 
these values we get 



Qn (Vk^Nz, H k ,n u n 2 ) = 0.7527 < 0.7562 = 1 
Qn (Uk,n u n 2 ,j,H k , NuN2 ) = 0.9382 > 0.9116 = 1 



K ■ ((ANj - 8) 2 + (AN 2 - 8) 2 ) 
{AK (JVi + N 2 - 2)) 2 ' 

3J 2(iV 1 + iV 2 ) 2 x 



2K (JVj + iV 2 - 4) + N 2 - 4) : 



■ J" 



again both (27) and (28) are satisfied. 



5 Discussion and Related Work 



Combining Theorems 4.3 and 4.6 we see that for each of our Gk,n 1: n 2 an d Hk,n±,n 2 graphs 
there always exists a clustering \J k,Nx,n 2 ,j which achieves higher modularity than the natural 
clustering ~Vk,N\,n 2 > a l so ^k,N!,n 2 ,j is arbitrarily different from Vk,n u n 2 - It is conceivable 
that the maximum modularity is achieved at some other clustering ^Wk,n ± ,n 2 which is not very 



different from ~Vk,Ni,n 2 - However, until such a clustering is discovered, Theorems 4.3 and 4.6 
indicate that, at least in some cases, the modularity Qn is not a reliable clustering quality 
function. 

In particular, recall that Qn was introduced as "a measure for the strength of the community 
structure found by our algorithms, which gives us an objective metric for choosing the number 
of communities into which a network should be divided" [25]. However, we have examples of 
modularity maximization both underestimating (see [TBI [14] ) and overestimating (see Section 
4j the true number of clusters. 

We have just used the term "true number of clusters" and in earlier parts of this paper the 
terms "best clustering" , "natural clustering" , etc. However, as remarked by several researchers 
[12] these terms are, to a great extent, ambiguous. For "best" to have an objective meaning, it 
must be defined in terms of a cluster quality function; what is then a good quality function? 
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If our goal is to evaluate a particular quality function Qi (by checking that it selects the 
best clustering) then obviously we cannot use the same function to determine what is "best" ; if 
we use another quality function Q 2 to determine "best", then we are only checking whether Qi 
and Q2 agree. But this gets us no closer to determining whether they select the true clustering. 
The approach is rather self-referential. 

Another possibility is to use labeled data, so that we know the true clustering in advance 
[36] . However, labeled data are not easy to get. In addition, there is always the possibility that 
labels (i.e., cluster membership) have been determined by external factors. Hoping that these 
factors will be sufficiently reflected in the connectivity of the graph seems overly optimistic. 
Indeed, for two of the most popular real world graph benchmarks (the Zachary karate club and 
football association network) it has been shown [35J that clusterings exist which have higher 
modularity than the true ones. 

It appears that the only avenue left to evaluate clustering quality functions is to use graphs 
which have a "natural" (i.e., intuitively clear) community structure. This approach has been 
used in the construction of several benchmark graph familes [21 GDI EI] and is also the one we 
have used in Section |H 

The same problems arise in trying to estimate the "true number of clusters". At any rate, 
it is clear that in modularity maximization the selection of cluster number is effected (more or 
less accurately) by the Qq term. Several "modified modularity functions" have been proposed 
[31 [THl [221 1331 EE] which control the effect of the Qq term by multiplying it with a tuning 
parameter 7; but these variants may also be plagued by a resolution limit [51 [191 US]- 

If we strip the Qo factor from Qn we are left with the intracluster edge density Qd which, 
by itself, does not provide a mechanism for cluster number selection. However this does not 
necessarily mean that Qd cannot be used for community detection. Let us conclude with a 
discussion of possible enhancements which will enable the use of Qd maximization for community 
detection. 

The general approach we have in mind involves the maximization of Qd for fixed values 
K G {1, K max } and the subsequent selection of the optimal K by use of a postprocessing 
"cluster number selection" or "model order" or "cluster validity" criterion (these terms are 
almost, but not totally, synonymous). A plethora of such criteria has appeared in the "classical" 
pattern recognition literature. The interested reader can consult the classic book [TT] or the 
review papers [101 [161 El El] • It should be mentioned that in the pattern recognition community 
the cluster number selection problem has been recognized to be "a fundamental, and largely 
unsolved, problem in cluster analysis" [30J . 

Some of the "classical" criteria have also been used in the community detection literature, 
where alternatives to modularity maximization have been explored in conjunction with cluster 
number selection using criteria based on statistics and information theory. References to criteria 
of this type appear in [12], Section IX.B; for example the Akaike Information Criterion pQ, the 
Bayesian Information Criterion [29], Minimum Description Length [27J etc. The gap statistic 
[32] and the knee criterion [2E] should also be mentioned at this point. 

Another way to estimate the number of clusters is through the use of some generative model, 
which essentially provides prior knowledge regarding the correct number of clusters. Some 
models of this type are discussed in [12], Section IX. A. 

Yet another possibility is to use Qd maximization in conjunction to some cluster validity 
criterion [HI [TBI E]- The idea is to perform Q d maximization for K 6 {1, 2, i^ max } and, 
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for each K value, grade the K-th. clustering 
function Q (V, G) of the form 



«(vi' f ',G) 



.k=l 



Vft }• by a clustering quality 



l/K 



(30) 



where q(V\G) is a cluster validity index (and the power l/K is used to reduce the possibility 
that the F (K) = maxvev K Q (V, G) is decreasing with K). Hence (30) makes the "global" 
clustering quality Q (V, G) a "separable" function of the "local" validities of the clusters V} K \ 
Vft and, hopefully, F (K) attains a global maximum at K = K true , the correct number of 
clusters. Similar approaches have been exploited in [T51 1231 13"T] . 
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